智能论文笔记

Transfer Learning Approaches for Knowledge Discovery in Grid-based Geo-Spatiotemporal Data

Aishwarya Sarkar , Jien Zhang , Chaoqun Lu , Ali Jannesari

分类：机器学习

2021-10-02

提取和精心分析地质时滞的特征至关重要，以认识到复杂的自然事件的基本原因，例如洪水。有关导致气候变化的隐藏因素的有限证据使得预测区域水放电的挑战性挑战。此外，复杂地质时尚环境数据中的爆炸性增长需要由最先进的神经网络重复学习每个新地区强调需要新的计算有效的方法，高级计算资源和对A的广泛培训大量可用监控数据。因此，我们提出了一种有效可重复使用的预训练模型，以通过有效地捕获其内在地质时造血方差来解决从一个区域转移到另一个区域的这个问题的问题。此外，我们在新地区提高了用于时空解释性的四种转移学习方法，以提高NASH-SUTCLIFFE效率9％至108％，随着时间的推移减少95％。

translated by 谷歌翻译

LogAnMeta: Log Anomaly Detection Using Meta Learning

Abhishek Sarkar , Tanmay Sen , Srimanta Kundu , Arijit Sarkar , Abdul Wazed

分类：机器学习 | (统计)机器学习

2022-12-21

Modern telecom systems are monitored with performance and system logs from multiple application layers and components. Detecting anomalous events from these logs is key to identify security breaches, resource over-utilization, critical/fatal errors, etc. Current supervised log anomaly detection frameworks tend to perform poorly on new types or signatures of anomalies with few or unseen samples in the training data. In this work, we propose a meta-learning-based log anomaly detection framework (LogAnMeta) for detecting anomalies from sequence of log events with few samples. LoganMeta train a hybrid few-shot classifier in an episodic manner. The experimental results demonstrate the efficacy of our proposed method

translated by 谷歌翻译

Scene-aware Egocentric 3D Human Pose Estimation

Jian Wang , Lingjie Liu , Weipeng Xu , Kripasindhu Sarkar , Diogo Luvizon , Christian Theobalt

分类：计算机视觉

2022-12-20

Egocentric 3D human pose estimation with a single head-mounted fisheye camera has recently attracted attention due to its numerous applications in virtual and augmented reality. Existing methods still struggle in challenging poses where the human body is highly occluded or is closely interacting with the scene. To address this issue, we propose a scene-aware egocentric pose estimation method that guides the prediction of the egocentric pose with scene constraints. To this end, we propose an egocentric depth estimation network to predict the scene depth map from a wide-view egocentric fisheye camera while mitigating the occlusion of the human body with a depth-inpainting network. Next, we propose a scene-aware pose estimation network that projects the 2D image features and estimated depth map of the scene into a voxel space and regresses the 3D pose with a V2V network. The voxel-based feature representation provides the direct geometric connection between 2D image features and scene geometry, and further facilitates the V2V network to constrain the predicted pose based on the estimated scene geometry. To enable the training of the aforementioned networks, we also generated a synthetic dataset, called EgoGTA, and an in-the-wild dataset based on EgoPW, called EgoPW-Scene. The experimental results of our new evaluation sequences show that the predicted 3D egocentric poses are accurate and physically plausible in terms of human-scene interaction, demonstrating that our method outperforms the state-of-the-art methods both quantitatively and qualitatively.

translated by 谷歌翻译

'If you build they will come': Automatic Identification of News-Stakeholders to detect Party Preference in News Coverage

Alapan Kuila , Sudeshna Sarkar

分类：自然语言处理 | 人工智能

2022-12-17

The coverage of different stakeholders mentioned in the news articles significantly impacts the slant or polarity detection of the concerned news publishers. For instance, the pro-government media outlets would give more coverage to the government stakeholders to increase their accessibility to the news audiences. In contrast, the anti-government news agencies would focus more on the views of the opponent stakeholders to inform the readers about the shortcomings of government policies. In this paper, we address the problem of stakeholder extraction from news articles and thereby determine the inherent bias present in news reporting. Identifying potential stakeholders in multi-topic news scenarios is challenging because each news topic has different stakeholders. The research presented in this paper utilizes both contextual information and external knowledge to identify the topic-specific stakeholders from news articles. We also apply a sequential incremental clustering algorithm to group the entities with similar stakeholder types. We carried out all our experiments on news articles on four Indian government policies published by numerous national and international news agencies. We also further generalize our system, and the experimental results show that the proposed model can be extended to other news topics.

translated by 谷歌翻译

Hippocampus-Inspired Cognitive Architecture (HICA) for Operant Conditioning

Deokgun Park , Md Ashaduzzaman Rubel Mondol , SM Mazharul Islam , Aishwarya Pothula

分类：人工智能

2022-12-16

The neural implementation of operant conditioning with few trials is unclear. We propose a Hippocampus-Inspired Cognitive Architecture (HICA) as a neural mechanism for operant conditioning. HICA explains a learning mechanism in which agents can learn a new behavior policy in a few trials, as mammals do in operant conditioning experiments. HICA is composed of two different types of modules. One is a universal learning module type that represents a cortical column in the neocortex gray matter. The working principle is modeled as Modulated Heterarchical Prediction Memory (mHPM). In mHPM, each module learns to predict a succeeding input vector given the sequence of the input vectors from lower layers and the context vectors from higher layers. The prediction is fed into the lower layers as a context signal (top-down feedback signaling), and into the higher layers as an input signal (bottom-up feedforward signaling). Rewards modulate the learning rate in those modules to memorize meaningful sequences effectively. In mHPM, each module updates in a local and distributed way compared to conventional end-to-end learning with backpropagation of the single objective loss. This local structure enables the heterarchical network of modules. The second type is an innate, special-purpose module representing various organs of the brain's subcortical system. Modules modeling organs such as the amygdala, hippocampus, and reward center are pre-programmed to enable instinctive behaviors. The hippocampus plays the role of the simulator. It is an autoregressive prediction model of the top-most level signal with a loop structure of memory, while cortical columns are lower layers that provide detailed information to the simulation. The simulation becomes the basis for learning with few trials and the deliberate planning required for operant conditioning.

translated by 谷歌翻译

Balloon-to-Balloon AdHoc Wireless Network Connectivity: Google Project Loon

Aishwarya Srinivasan

分类：人工智能

2022-12-13

Project Loon is a Google initiated research project from the Google X Lab. The project focuses on providing remote internet access and network connectivity. The connectivity is established in vertical and horizontal space; vertical connectivity between Google Access Point (GAP) and the balloons, and between balloons and antennas installed at land; horizontal connectivity is between the balloons. This research focuses on the connectivity between the balloons in a mesh network. The proposal focuses on implementing graphical methods like convex hull with adhoc communication protocols. The proposed protocol includes content-based multicasting using angular sector division rather than grids, along with dynamic core-based mesh protocol defining certain core active nodes and passive nodes forming the convex hull. The transmission (multicasting and broadcasting) between the nodes will be evaluated using the link probability defining the probability of the link between two nodes failing. Based on the link probability and node features, best path between transmitting and receiver nodes will be evaluated.

translated by 谷歌翻译

A Visual Active Search Framework for Geospatial Exploration

Anindya Sarkar , Michael Lanier , Scott Alfeld , Roman Garnett , Nathan Jacobs , Yevgeniy Vorobeychik

分类：计算机视觉 | 人工智能

2022-11-28

Many problems can be viewed as forms of geospatial search aided by aerial imagery, with examples ranging from detecting poaching activity to human trafficking. We model this class of problems in a visual active search (VAS) framework, which takes as input an image of a broad area, and aims to identify as many examples of a target object as possible. It does this through a limited sequence of queries, each of which verifies whether an example is present in a given region. We propose a reinforcement learning approach for VAS that leverages a collection of fully annotated search tasks as training data to learn a search policy, and combines features of the input image with a natural representation of active search state. Additionally, we propose domain adaptation techniques to improve the policy at decision time when training data is not fully reflective of the test-time distribution of VAS tasks. Through extensive experiments on several satellite imagery datasets, we show that the proposed approach significantly outperforms several strong baselines. Code and data will be made public.

translated by 谷歌翻译

XKD: Cross-modal Knowledge Distillation with Domain Alignment for Video Representation Learning

Pritam Sarkar , Ali Etemad

分类：计算机视觉

2022-11-25

We present XKD, a novel self-supervised framework to learn meaningful representations from unlabelled video clips. XKD is trained with two pseudo tasks. First, masked data reconstruction is performed to learn modality-specific representations. Next, self-supervised cross-modal knowledge distillation is performed between the two modalities through teacher-student setups to learn complementary information. To identify the most effective information to transfer and also to tackle the domain gap between audio and visual modalities which could hinder knowledge transfer, we introduce a domain alignment strategy for effective cross-modal distillation. Lastly, to develop a general-purpose solution capable of handling both audio and visual streams, a modality-agnostic variant of our proposed framework is introduced, which uses the same backbone for both audio and visual modalities. Our proposed cross-modal knowledge distillation improves linear evaluation top-1 accuracy of video action classification by 8.4% on UCF101, 8.1% on HMDB51, 13.8% on Kinetics-Sound, and 14.2% on Kinetics400. Additionally, our modality-agnostic variant shows promising results in developing a general-purpose network capable of handling different data streams. The code is released on the project website.

translated by 谷歌翻译

UMFuse: Unified Multi View Fusion for Human Editing applications

Rishabh Jain , Mayur Hemani , Duygu Ceylan , Krishna Kumar Singh , Jingwan Lu , Mausooom Sarkar , Balaji Krishnamurthy

分类：计算机视觉 | 人工智能

2022-11-17

The vision community has explored numerous pose guided human editing methods due to their extensive practical applications. Most of these methods still use an image-to-image formulation in which a single image is given as input to produce an edited image as output. However, the problem is ill-defined in cases when the target pose is significantly different from the input pose. Existing methods then resort to in-painting or style transfer to handle occlusions and preserve content. In this paper, we explore the utilization of multiple views to minimize the issue of missing information and generate an accurate representation of the underlying human model. To fuse the knowledge from multiple viewpoints, we design a selector network that takes the pose keypoints and texture from images and generates an interpretable per-pixel selection map. After that, the encodings from a separate network (trained on a single image human reposing task) are merged in the latent space. This enables us to generate accurate, precise, and visually coherent images for different editing tasks. We show the application of our network on 2 newly proposed tasks - Multi-view human reposing, and Mix-and-match human image generation. Additionally, we study the limitations of single-view editing and scenarios in which multi-view provides a much better alternative.

translated by 谷歌翻译

Parameter and Data Efficient Continual Pre-training for Robustness to Dialectal Variance in Arabic

Soumajyoti Sarkar , Kaixiang Lin , Sailik Sengupta , Leonard Lausen , Sheng Zha , Saab Mansour

分类：自然语言处理 | 机器学习

2022-11-08

The use of multilingual language models for tasks in low and high-resource languages has been a success story in deep learning. In recent times, Arabic has been receiving widespread attention on account of its dialectal variance. While prior research studies have tried to adapt these multilingual models for dialectal variants of Arabic, it still remains a challenging problem owing to the lack of sufficient monolingual dialectal data and parallel translation data of such dialectal variants. It remains an open problem on whether the limited dialectical data can be used to improve the models trained in Arabic on its dialectal variants. First, we show that multilingual-BERT (mBERT) incrementally pretrained on Arabic monolingual data takes less training time and yields comparable accuracy when compared to our custom monolingual Arabic model and beat existing models (by an avg metric of +$6.41$). We then explore two continual pre-training methods-- (1) using small amounts of dialectical data for continual finetuning and (2) parallel Arabic to English data and a Translation Language Modeling loss function. We show that both approaches help improve performance on dialectal classification tasks ($+4.64$ avg. gain) when used on monolingual models.

translated by 谷歌翻译